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Abstract. We investigate measures of complexity of function classes 
based on continuity moduli of Gaussian and Rademacher processes. For 
Gaussian processes, we obtain bounds on the continuity modulus on 
the convex hull of a function class in terms of the same quantity for 
the class itself. We also obtain new bounds on generalization error in 
terms of localized Rademacher complexities. This allows us to prove new 
results about generalization performance for convex hulls in terms of 
characteristics of the base class. As a byproduct, we obtain a simple 
proof of some of the known bounds on the entropy of convex hulls. 



1 Introduction 

Convex hulls of function classes have become of great interest in Machine Learn- 
ing since the introduction of AdaBoost and other methods of combining classi- 
fiers. The most commonly used measure of complexity of convex hulls is based 
on covering numbers (or metric entropies). The first bound on the entropy of 
the convex hull of a set in a Hilbert space was obtained by Dudley |S] and later 
refined by Ball and Pajor |J and a different proof was given independently by 
van der Vaart and Wellner . These authors considered the case of polynomial 
growth of the covering numbers of the base class. Sharp bounds in the case of 
exponential growth of the covering numbers of the base class as well as exten- 
sion of previously konwn results to the case of Banach spaces were obtained later 
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In Machine Learning, however, the quantities of primary importance for de- 
termining the generalization performance are not the entropies themselves but 
rather localized Gaussian or Rademacher complexities of the function classes 
|12I2| . These quantities are closely related to continuity moduli of the corre- 
sponding stochastic processes. 
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Our main purpose in this paper is to provide an easy bound on the continuity 
modulus of stochastic processes like Rademacher or Gaussian processes on the 
convex hull of a class in terms of the continuity modulus on the class itself. We 
combine this result with some new bounds on the generalization error in function 
learning problems based on localized Rademacher complexities. This allows us 
to bound the generalization error for convex hulls in terms of characteristics of 
the base class. 

In addition to this, we use the bounds on continuity moduli on convex hulls 
to give very simple proofs of some previously known results on the entropy of 
such classes. 

2 Continuity Modulus on Convex Hulls 

Let T be a subset of a Hilbert space H and W denote an isonormal Gaussian pro- 
cess defined on H, that is a collection (W(h))hen 01 Gaussian random variables 
indexed by H such that 

V/i e H, E [W(h)} = and V/i, h' e H, E [W(h)W{h')\ = (h, h') H . 

We define the modulus of continuity of the process W as 

sup \W(f)-W(g)\ . 

ll/-sll<« 

Let T e denote a minimal £-net of J- ', i.e. a subset of T of minimal cardinality 
such that T is contained in the union of the balls of radius e with centers in T e . 
Let T e denote a maximal e-separated subset of F, i.e. a subset of T of maximal 
cardinality such that the distance between any two points in this subset is larger 
than or equal to e. The e-covering number of T is then defined as 

N(T,e) :=N u {T,e) = \T e \ , 

and the e-entropy is H {T, e) = \ogN{T, e). 

2.1 Main Result 

Our main result relates the continuity modulus on the convex hull of a set T to 
the continuity modulus on this set. 

Theorem 1. We have for all 5 > 

w(conv(^), 6) < mi {2uj{T, e) + 6y/N(F, e)) . 

Proof. Let e > 0, L be the linear span of!F e and II L be the orthogonal projection 
on I. We have for all / £ J 7 , 



f = n L (f) + n L ^f). 



uj(conv(F),S) < E 



sup \W(II L f) — W(II L g)\ 



Il/-Sll<'5 



+E 



sup \w{n L ±f)-w{n L ±g)\ 

/, 9 econv(^) 
ll/-«ll<« 



Now since for any orthogonal projection 77, \\II(f) — II(g)\\ < \\f — g\\ we have 

w(conv(.F), S) < oj(IIl conv(.F), 5) + lj(II l ± conv(J r ), 5) . 

Moreover, we have 7Tconv(jF) = conv(7TjF) by linearity of the orthogonal pro- 
jection so that 

w(conv(jF), S) < lj (conv (II l J 7 ), 5) + uj (conv (II L ± J 7 ), S) . 

This gives the first inequality. Next we have 

uj(IIl conv(.F), 5) < uj(L, S) , 

and by linearity of W, 



u(L,5) = E 



sup \W(f)\ 

ll/ll<« 



< SE 



sup (Z, y) 
ii»n R d<i 



where Z is a standard normal vector in M. d (with d = dimL and 
euclidean norm in K d ). This gives 



the 



u(L,S) < SE[\\Z\\ Rd ] < SVdimL < 6y/N(T,e). 



We also get 



w(J7 I ,.LConv(.F),J) < 2E 
Since II L ± is linear, the supremum is attained at elements of F , that is 



sup \w(n L ,f)\ 

/Gconv(jF) 



;(7T l _l conv(jF),£) < 2E 



sup | W{II L ±f) | 



Now for each / e JF, let g be the closest point to / in T e . Then we have 
11/ — 1 1 < £ and g G LtlJ 7 so that II L ±g = and thus 



w(/7 L i conv(jF),5) < 21 



sup \W(n L ±f) - W(II L ±g)\ 



ll/-3ll<£ 



Now since 77 L ± is a contraction, using Slcpian's lemma (see Theorem 3.15 
page 78) we get 



w(77 L x conv(JP),<5) < 2E sup \W(f) - W(g)\ 



\\f-9\\<e 



This concludes the proof. 



□ 



Note that Theorem fallows us to give a positive answer to a question raised 
by Dudley ^H] ■ Indeed, we can prove that the convex hull of a uniformly Donsker 
class is uniformly Donsker. Due to lack of space, we do not give the details here. 

2.2 Examples 

As an application of Theorem^ we will derive bounds on the continuity modulus 
of convex hulls of classes for which we know the rate of growth of the entropy. 
By Dudley's entropy bound (see Theorem 11.17, page 321) we have 



Jo 

We will also use below the following version of this result (that easily follows 
from Dudley's chaining argument and is well known) 



for all e > S. 

We first consider the case when the entropy of the base class grows logarith- 
mically. 

Example 1. If for all e > 0, 





N{T,e) < Ke- 

then for all 8 > 0, 



^(convOF), S) < KS 2 '^ log y /( 2+y ) S' 1 . 
Proof. We have from Theorem^ 



w(conv(.F), S) < inf ( K / log 1/2 u^du + 5e~ v/2 




Choosing 



e = 



S 2V/(2+V) log 2V/(2+V) g-1 ^ 



we obtain for S < 1, 



«(conv(J0, S) < K5 2 /( 2+v > log v /( 2+y ) tf" 1 . 



□ 



Although the main term in the above bound is correct, we obtain a superfluous 
logarithm. This logarithm can be removed if one uses directly the entropy integral 
in combination with results on the entropy of the convex hull of such classes 
|1I19I17| . At the moment of this writing, we do not know a simple proof of this 
fact that does not rely upon the bounds on the entropy of convex hulls. 

Now we consider the case when the entropy of the base class has polynomial 
growth. In this case, we shall distinguish several situations: when the exponent is 
larger than 2, the class is no longer pre-Gaussian which means that the continuity 
modulus is unbounded. However, it is possible to study the continuity modulus 
of a restricted class. Here we consider the convex hull of a ^-separated subset of 
the base class, for which the continuity modulus is bounded when computed at 
a scale proportional to 5. 

Example 2. If for all e > 0, 



H(F, e) < Ke 



-v 



then for all 8 > 0, for < V < 2 



w(conv(jF), 8) < K\og 1,2 - 1/v S- 1 , 



for V = 2, 



conv (T S/4 ),S) <K\ogS-\ 



and for V > 2, 



w(conv(.F 5/4 ),<S) < KS 1 -^ 2 . 
Proof. We have from Theorem Q] for e > <5/4, 




For < V < 2, this gives 




Choosing 



e = K 1 ' v log- 1 ' v 6- 1 , 



we obtain for 5 small enough 



w(conv(^),<5) < K log< v - a)/av J -1 . 



For V = 2, we get 




Taking e = 1/4 we get for 8 small enough 

cj(conv(J r5 ^),8) < KlogS' 1 . 



□ 



For V > 2, we get 

^(convOF 5 / 4 ), 6) < inf [K8^ 2 - v ^ 2 - e^)/ 2 + S cxp(Ke~ 2 /2)^J . 

Taking e — > oo, we obtain 

W (conv(^/ 4 ),5)<^ 2 - v )/ 2 . 

3 Generalization Error Bounds 
3.1 Results 

We begin this section with a general bound that relates the error of the function 
minimizing the empirical risk to a local measure of complexity of the class which 
is the same in spirit as the bound in |12|. 

Let (S,A) be a measurable space and let Xx,...,X n be n i.i.d. random 
variables in this space with common distribution P. P n will denote the empirical 
measure based on the sample 



1 n 

n — ^ 



P n = - y Xl 
n 
t=i 

In what follows, H — L,2(P n ) and we are using the notations of Section [21 

We consider a class T of measurable functions defined on S with values in 
[0, 1]. We assume in what follows that J- also satisfies standard measurability 
conditions used in the theory of empirical processes as in |9I19) . 
We define 

n 

Rn(f) :=-y>/(Xi), 

i— 1 

and let ip n be an increasing concave (possibly data-dependent random) function 
with ip n (0) = such that 



sup \R n (f)\ 

P n f<r 



< VnCVr), Vr >0. 



Let f n be the largest solution of the equation 

r = MV^)- (1) 

The solution f n of Q gives what is usually called zero error rate for the class 
■F[T2*]. i.e. the bound for Pf given that P n f = 0. 

The bounds we obtain below are data-dependent and they do not require 
any structural assumptions on the class (such as VC conditions or entropy con- 
ditions). Note that f n is determined only by the restriction of the class T to the 
sample (Xi, . . . , X n ). 



Theorem 2. If ip n is a non- decreasing concave function and ?/> n (0) = then 
there exists K > such that with probability at least 1 — 2e~* for all f G T 



Pf<K[P n f + f, 



t + log log n 



n 



(2) 



It is most common to estimate the expectation of Rademacher processes via 
entropy integral (Theorem 2.2.4 in |19|): 



sup |i?„(/)| 

P n f<S 



< 



V3 f^' 2 H i/2 {TiU)dUi 
Vn Jo 



which means one can choose ipn(^) as the right hand side of the above bound. 
This approach was used for instance in ^2] ■ 

Our goal here will be to apply the bound of Theorem to the function 
learning problem in the convex hull of a given class. 

Let Q be a class of measurable functions from S into [0, 1]. Let go <E conv((?) 
be an unknown target function. The goal is to learn g based on the data 
(X 1 ,g (X 1 )), (X n ,g (X n j). We introduce g n defined as 

g n := arg min P n \g - g \ , 

g£conv(C?) 

which in principle can be computed from the data. 
We introduce the function ip n (G, 6) defined as 



MG,5) 



\ / — inf (u)(G, 
V 2ne>o\ v ' 



■8y/N(G,e) 



Corollary 1. Let f n {G) be the largest solution of the equation 

r = i> n (G, s/f) . 

Then there exists K > such that for all go S conv(C?) the following inequality 
holds with probability at least 1 — 2e~* 



P\g n -g \<K(r n {G) + 



t + log log n 



Proof. Let T — {\g — go\ ■ g G conv((?)}. Note that ip n (G,6) is concave non- 
decreasing (as the infimum of linear functions) and tpn(G, 0) = 0, it can thus be 
used in Theorem [21 We obtain (using bound (4.8) on page 97 of |13| ) 



E 



sup |i?„(/)| 



Pnf<r 



< 



E 



E 



Pnf<r 



sup 



sup \W P M)\ 

\WpM)\ 

< S^j{camQ, Vr) < ipn{G, Vr) , 



where in the last step we used Theorem ^ To complete the proof, it is enough 
to notice that P n \gn ~ 5o| =0 (since go G conv(C/)) and to use the bound of 
Theorem |21 □ 

A simple application of the above corollary in combination with the bounds of 
examples ^and [21 give, for instance, the following rates. If the covering numbers 
of the base class grow polynomially, i.e. for some V > 0, 

N(G,e)<Ke- y , 

then we obtain f n of the order of 

1 2 + V 

n 2 !+v . 

This can be compared with the main result in JS] . If the entropy is polynomial 
with exponent < V < 2, f n is of the order of 

n-hog 1/2 - 1/y n. 



3.2 Additional Proofs 

Our main goal in this section is to prove Theorem |2 
Denote 

^)=21og(^log 2 ^ 
and define U(S) as the largest solution of the equation 



U = 8 + 8E £ 



sup \R n (f)\ 

Pnf<U 



25{t + l(S))\ 1/2 10(t + /(£)) 



3n 



while r(S) is the largest solution of the equation 

'4r(t + l(2r))\ 1/2 10(t + l(2r)) 



r = 5 + 8E £ 



sup 

P„/<(7(2r) 



3n 



(3) 



(4) 



Notice that the construction of r(<5) depends only on the sample (Xi, . . . , X n ) 
and the restriction of the class T to the sample. 

Theorem 3. With probability at least 1 — 2e~ t for all f € T 

Pf < r(Pnf). (5) 
Proof. We define 5k = 2~ k for k > 0, and consider a sequence of classes 
T k = {/ G ^ : 4+i < P/ < 4} ■ 

If we denote 

R k =E e sup|i?„(/)| 



then the symmetrization inequality implies that 



E 



sup|P n /-P/| 



< 2E [R k ] , 



which in combination with Theorem 3 in ^ (with P(f — Pf) 2 < Pf 2 < Pf < 5k 
implies that with probability at least 1 — e~* for all / £ Tk 

|P„/-P/l<4E|« d+ (^)" J + ii. 
Theorem 16 in |3] gives that with probability at least 1 — e~* 



E [Rk] < 



1/2 



t \ 1/2 V 2* 
77- + Pfc < — + 2Pfe- 

2n J n 



Therefore, with probability at least 1 — 2e * for all / £ Tk 

1/2 

3n 



|P«/ - Pf\ < 8R k 



{25 k t\ x/ * lOt 



\ n 



Finally, replacing t by t + 1(5 f.) and applying the union bound we get that with 
probability at least 1 — 2e~* for all k > and for all / € Tk 

| P ,, / _P /l < 8Rl , + fHM£±i<M) 1 ^i£(i±i(«). (6) 

V n 1 on 



If we denote 



25k(t + l(5k))\ 1/2 , 10(t + l(5 k )) 



Uk=5 k + 8R k + , 

n J on 
then on this event for any fixed fc and for all / G P n / < [//. and, hence, 



Rk <E e 



sup |P„(/)| 

P n f<U k 



which can be rewritten in terms of Uk as 



U k < 5 k + 81 



sup |P n (/)| 

P n f<U k 



25k(t + l(5k))\ 1/2 , 10(t + Z(J fe )) 



3n 



This means that [//. < U(5f.), where U(5) is defined in Finally, © implies 
that for all k and / 6 T k 



Pf < Pnf + 81 



SUP \R n (f)\ 
Pnf<U(8 k ) 



/ 25k(t + l(5 k )) \ 1/2 + 10(t + l(5 k )) 
\ n J in 



If / S Tk, then < 2P/, which proves the theorem. 



□ 



Notice that if we replace the right-hand sides of |J5J and Q by upper bounds, 
we only increase the value of the solutions and the theorem remains true for these 
new solutions. Moreover, since the solution of (QJ is necessarily larger than 1/n, 
it is enough to consider © only for S > 1/n. So assuming that we have the 
bound 



sup 

Pn/<r 



Rn(f)\ 



we can replace (using that 2\fab < a + b) Q and (@J by 



U = K 1 (« + ^ n (VtO+r ) 



(7) 



r = 8 + K 2 (V„(Vt/ e (2r)) + ^ + r ) . (8) 

where ro = {t + log log n)/n. The solutions of those equations are denoted re- 
spectively Ui (S) and n (6) . 

Proof of Theorem^ Let a < 1 and consider k non-negative functions <f>i 
satisfying one of the following conditions 

Va;>0, VC>1, ^{Cx) < C a <t> t {x) , (9) 

or 

4>i{x) is non-increasing for x > . (10) 
Define now for each i = 1, . . . , k tij as the largest solution of the equation 

w = cj>i(u) , 

(assuming the existence of the solutions). 

Note that from the conditions or QlOfl , we obtain for all c > and all C > 1 

+ c)) < C tt (u,+c). (11) 

We thus deduce that the largest solution u* of the equation 

fe 

i=l 

satishes u* < C X)f=i w » f° r some large enough C. 

It is easy to see that the right-hand side of J7J is a sum of functions satisfying 
<|11[) . Indeed, we have by the concavity of ip n (and ip n (0) = 0) and the 
definition of f„, 

i>nWC(r n + c)) < VCtp n (y/f n + c) < VC{f n + c) . 
The above reasoning thus proves that Ui(S) < K(S + f n + r )). 



We can thus replace equation JSJ by the following whose solution r 2 (S) will 
upper bound ri(S): 

r = 5 + K 1 N) n {s/ K 2 (r + f„ + r )) + V^o + r o) • 

Once again we can check that the righ-hand side is a sum of functions 
satisfying The same reasoning as before proves that 



r(6) < r 2 (S) <K(5 + 



which finishes the proof. 



□ 



4 Entropy of Convex Hulls 

4.1 Relating Entropy With Continuity Modulus 

By Sudakov's minoration (see Theorem 3.18, page 80) we have 



e>0 



sup | W(f) I 



Let B(f,5) be the ball centered in / of radius S. We define 

H{F, S, e) := sup H(B(f, 5) HT,e). 
far 

The following lemma relates the entropy of T with the modulus of continuity 
of the process W. This type of bound is well known (see e.g. but we give 
the proof for completeness. 

Lemma 1. Assume T is of diameter 1. For all integer k we have 

k 



i=0 



This can also be written 



h 



u) du . 



Proof. We have 



u(F,6) = E 



sup \W(f)-W(g)\ 



ll/-sll<« 



> supE 



sup \W{f) - W(g)\ 
g eB(f,6)nF 



> snpsupeH 1/2 (B(f,S) n T,e) , 



so that we obtain 

Notice that we can construct a 2~ k covering of T by covering T by N{T, 1) 
balls of radius 1 and then covering the intersection of each of these balls with 
T with N(B(f, 1) n J 7 , 1/2) balls of radius 1/2 and so on. We thus have 

k 

N{T, 2~ k ) < TT sup N(B(f, 2 1 - 4 ) n F, 2~ l ) . 

Hence 

k 

8=0 

We thus have 

k k 
i=0 j=0 

which concludes the proof. □ 

Next we present a modification of the previous lemma that can be applied to 
5-separated subsets. 

Lemma 2. Assume T is of diameter 1. For all integer k we have 

k 

#1/2(^2-*=) < K^2 t uj{T 2 ~^ 1 ,2 2 - 1 ) . 

i=0 

Proof. Notice that for / e there exists /' G JF 5 / 4 such that 

B(f, 5) n T C B(f, 5 + J/4) n T . 

Moreover, since a maximal 5-separated set is a <5-net, 

N(f,S) < \N 5 \ =N(f 5 ,6/2), 

since for a ^-separated set A we have N(A, 5/2) = \ A\. 
Let's prove that we have for any 7, 

(B(f, 7) U T) 5 ' 2 J < J B(f, 7 + 5/4) U T 5 ' A 

Indeed, since the points in .p 5 / 4 form a 5/4 cover of J 7 , all the points in 
(£?(/, 7) U T) & l 2 are at distance less than 5/4 of one and only one point of T 5 / A 
(the unicity comes from the fact that they are 5/2 separated). We can thus 
establish an injection from points in (£?(/, 7) U T) & l 2 to corresponding points 



Ill 

JPV4 and the 

image of this injection is included in B(f, 7 + 5/4) since the 
image points are within distance 6/4 of points in J5(/, 7). 
Now we obtain 

N((B(f, 5 + 6/4) U F) s/2 , 5/4) < N(B(f, 38/2) U F s/i , 5/8) . 
We thus have 

N(B(f, 5) U T, 5/2) < N(B(f , 5 + 5/4) U T, 5/2) 

< N((B(f, 5 + 5/4) U T) 5/2 , 5/4) 

< N(B(f',36/2)UT 5/4 ,5/8). 

This gives 

sup N(B(f, 6)0 J 7 , 5/2) < sup N{B(f, 36/2) n T s,i , 5/8) 
= N{T s/i , 35/2,5/8). 

Hence 

H{T, 6, 5/2) < H{T S/4 , 35/2, 5/8) . 
By the same argument as in previous Lemma we obtain 

8 

□ 

4.2 Applications 

Example 3. If for all e > 0, 

W(^,e) <e _V , 

then for all e > 0, 

tf(conv(n e) < £ - 2V ^ 2+v ~> \og 2V / {2+v ^ e' 1 . 
Proof. Recall from Example ^ that 

w(conv(n 5) < K5 2 ^ 2+v ^ \og v ^ 2+v ^ 8~ l . 
Now, using LemmaQwe get 

k 



H x / 2 (conv(F),2- k ) < Kj2^ 2{1 ' i)/{2+V) (i ~ l) v/( - 2+v) 

z=0 

= Kj2(2 v ^ 2+V ^(i- i)W(2+v)_ 



i=0 



We check that in the above sum, the i-ih term is always larger than twice the 
i — 1-th term (for i > 2) so that we can upper bound the sum by the last term, 

#1/2(^2-*) < K(2 v ^ 2+V ^) k (k- l) v /^+ v ) , 

hence, using e = 2 , we get the result. □ 

Note that the result we obtain contains an extra logarithmic factor compared 
to the optimal bound |19I17| . 

Example 4- If for all e > 0, 

H{F,e) < e~ v , 
then for all e > 0, for < V < 2, 

£T(conv(JF),e) < e~ 2 log 1 " v/2 e~ l , 

for V = 2, 

jr(conv(.F),e) < e -2 log 2 e -1 , 

and for V" > 2, 

i2"(conv(«F),e) < . 
Proof. The proof is similar to the previous one. □ 
In this example, all the bounds are known to be sharp |filll| . 
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